Name: __ Class: __ Date: __
Difficulty: Easy
This activity is inspired by the work of the Ocean Cleanup Project and #TeamSeas started by YouTubers MrBeast and Mark Rober. It uses data from Meijer L. J. J. et al., 2021, Sci. Adv., More than 1000 rivers account for 80% of global riverine plastic emissions into the ocean, DOI: 10.1126/sciadv.aaz5803 which is provided here and has been downloaded into the Meijer2021_midpoint_emissions directory for this activity.
Explore the Ocean Cleanup Projects' interactive map version of this data on their website here.
You can see the full #TeamSeas campaign to remove plastic from the Ocean here (I am in no way affiliated with any campaigns or organisations listed above).
We are going to explore this river pollution dataset using GeoPandas. GeoPandas is built on Pandas and allows us to work with geospatial data.
You will need to have GeoPandas, contextily, matplotlib + ipywidgets, and adjustText installed to run all the code.
# run me
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as cx
from geopandas.tools import sjoin
from numpy import log
from adjustText import adjust_text
%matplotlib widget
We can use GeoPandas to read in the special shapefile which contains our geospatial data. Go ahead and run the next code cell.
# run me
rivers = gpd.read_file("./Meijer2021_midpoint_emissions/Meijer2021_midpoint_emissions.shp")
Let's look and see how many rivers are in the dataset using .shape.
# run me
rivers.shape
(31819, 2)
There are 31,819 rivers included in this dataset! Let's look at the first 5 rows using .head().
# run me
rivers.head()
| dots_exten | geometry | |
|---|---|---|
| 0 | 0.164904 | POINT (168.79792 -46.58083) |
| 1 | 0.124932 | POINT (168.34875 -46.44708) |
| 2 | 1.213370 | POINT (168.33708 -46.41875) |
| 3 | 0.121138 | POINT (168.02125 -46.35792) |
| 4 | 0.197533 | POINT (169.81125 -46.34375) |
The dots_exten column tells us the total annual plastic emissions in metric tons and the geometry column contains the POINTs showing the locations of all the rivers.
It would be good to know the maximum and minimum values of dots_exten. Use the next two code cells to print out the min and max values. Treat rivers as a normal Pandas DataFrame.
# print out the min dots_exten value
rivers['dots_exten'].min()
0.0
# print out the max dots_exten value
rivers['dots_exten'].max()
62591.9
At least one river in the dataset has no or close to $0\ T$ of plastic pollution.
On the other hand the max polluting river is emitting $62,591.9\ T$ of plastic each year!
Before we move on we should know which Coordinate Reference System (CRS) the data is stored in. Run the code below.
# run me
rivers.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
The CRS is WGS 84 which is the latitude longitude projection. More info on reference systems can be found here.
There are two easy tools which we can use to visualise this dataset.
The first is calling .plot() on our GeoPandas dataset. This will plot all the river points using Matplotlib.
Run the code below to see the figure.
# run me
rivers.plot(column='dots_exten')
<AxesSubplot:>
This is great to quickly visualise the data but it looks terrible and has no scale/colourbar!
We can use .explore() to create an interactive figure of our data.
This figure may be slow to repond to hover/panning etc. since the dataset is so large!
# run me
rivers.explore()